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In this paper, a simply rule that generates scale-free networks with very large clustering coefficient 
and very small average distance is presented. These networks are called Multistage Random 
Growing Networks(MRGN) as the adding process of a new node to the network is composed of 
two stages. The analytic results of power-law exponent 7 = 3 and clustering coefficient C — 0.81 are 
obtained, which agree with the simulation results approximately. In addition, the average distance of 
the networks increases logarithmical with the number of the network vertices is proved analytically. 
Since many real-life networks are both scale-free and small-world networks, MRGN may perform 
well in mimicking reality. 

PACS numbers: 89.75.Da, 89.75.Fb, 89.75.Hc 



The past few years have witnessed a great devotion by 
physicists to understand and characterize the underlying 
mechanisms of complex networks including the Internet, 
the World Wide Web, the scientific collaboration net- 
works and so on0,|3,|3iSISl9- The results of many ex- 
periments and statistical analysis indicate that the net- 
works in various fields have some common characteris- 
tics. They have a small average distance like random 
graphs, a large clustering coefficient and power-law de- 
gree distribution 0, |2| , which is called the small- world 
and scale-free characteristics. Recent works on the math- 
ematics of networks have been driven largely by the em- 
pirical properties of real-life networks^ 0, ^ E3i 0> EM 
^3, I and the studies on network dynamics Iw, 

El [11 III Ip, I21IJ2I 27, 28, 29, 30], 

optimizati onlBlL [32. [33. iML ISSL l36l and evolutionary 
lIllSlliliflliMBlMllMll The first 

successful attempt to generate networks with high clus- 
tering coefficients and small average distance is that of 
Watts and Strogatz (WS model) Another significant 
model is proposed by Barabasi and Albert called scale- 
free network (BA network) '2l|. The BA model suggests 
that growth and preferential attachment are two main 
self-organization mechanisms of the scale-free networks 
structure. These point to the fact that many real-world 
networks continuously grow by the way that new nodes 
and edges are added to the network, and new nodes would 
like to attach to the existing nodes with large number of 
neighbors. 

Dorogovtsev et. al proposed an simple model of scale- 
free growing networks for any size of the network |42j| . 
The idea of the model is that a new node is added to the 
network at each time step, which connects to both ends 
of a randomly chosen link undirected. The model can be 
described by the process that the newly added node con- 
nect to node i preferentially, then select a neighbor node 
of the node i randomly. Holme et. al proposed the fa- 
mous model to generate growing scale-free networks with 
tunable clustering The model introduced a addi- 

tional step to get the trial information and demonstrated 
that the average number of triad formation trials con- 
trols the clustering coefficient of the network. It should 



be noticed that the newly added node connected the first 
node i preferentially. Actually, it would like to connect 
the neighbor nodes of node i preferentially. Inspired by 
these questions, we give the multistage random growing 
networks model. At each time step, the new node is 
added to the network preferentially, then it would find 
one of the node's neighbors to connect preferentially. 

A scale-free small-world network using a very simple 
rule is presented. The network starts with a triangle con- 
taining three nodes marked as I, II and III. At each time 
step, a new node is added to the network with two edges. 
The first edge would choosing node to connected depends 
on the degree ki of node i, such that ki/'Y^ki, and then 
attach another edge to a node which is connected with 
the first selected node preferentially. According to this 
process, the general iterative algorithm of MRGN is in- 
troduced. A{t) denotes MRGN after t iterations. Since 
the network size increases by one at each time step, t is 
used to represent the node added in the tth. step. At step 
t, we can easily see that the network consists oi N = t + ?> 
vertices. The total degree equals 4t -I- 3. When t is large, 
the average degree at step t is equal approximate to a 
constant value 4, which shows that MRGN is sparse like 
many real-life network Q, |^ . The topology character- 
istics of the model are analyzed both analytically and by 
numerical calculations. The analytical expressions agree 
with the numerical simulations approximately. 

The distribution is one of the most important statis- 
tical characteristics of networks. Since many real-world 
networks are scale-free networks, whether the network is 
of the power-law degree distribution is a criterion to judge 
the validity of the model. By using the mean-field the- 
ory, the evolution of the degree distribution of individual 
nodes can be described as following 

-^=P{i)+Y.P{^\J)P(J), (1) 

where P{i) denotes the possibility that the node i with 
degree h is selected in the first step, P{i\j) denotes the 
conditional possibility that node i is the neighbor of node 
j with degree kj which have been selected at the first step 



2 



and Ti denotes the neighbor node set of node i. Because 
the new node is added to the network preferentially, one 
has 



2^7 = 1 «J 



(2) 



The conditional possibility P{i\j) can be calculated by 

hi 



(3) 



Since every newly added node has two edges, P(i\j) can 
be approximately by P(i\j) = Then, one can get 

that 



dkj 
'dt 



E 



(4) 



The sum in the denominator goes over all nodes in the 
network except the newly introduced one, thus its value 
is J2j kj = 2t + 3. The solution of Equ. with the 
initial condition that every node i at its introduction has 

ki{ti) = 2, is 



hit) = i-r, 



(5) 



where (3 — 0.5. One can get that the degree distribution 
of MRGN is as following 



P{k) 



(6) 



where 7=4 



3. The numerical simulation results 



are demonstrated in Fig. 1. 
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FIG. 1: Degree distribution of MRGN, with = 20000 
(hexagons), N = 15000 (pentagons), N = 10000 (diamonds) 
and A'' = 5000 (squares). In this figure, p{k) denotes the prob- 
ability of the number of nodes with degree k in the network. 
The power-law degree distribution exponent j of the four 
probability density function are 725000 = 2.88 ±0.02, 720000 = 
2.88 ± 0.05 , 715000 = 2.86 ± 0.06 and 710000 = 2.85 ± 0.02 



As we have mentioned above, the degree distribution 
is one of the most important statistical characteristics of 
networks. The average distance is also one of the most 
important parameters to measure the efficiency of com- 
munication network. The average distance L of the net- 
work is defined as the mean distance over all pairs of 
nodes. The average distance plays a significant role in 
measuring the transmission delay. Marked each node of 
the network according to the time when the node is added 
to the network. Firstly, we give the following lemma |37j . 

Lemma 1 For any two nodes i and j , each shortest 
path from i to j does not pass through any nodes k satis- 
fying that k > max{i,j}. 

Proof. Denote the shortest path from node i to j of 
length 71 -I- 1 by I — > — > ■ ■ ■ j{SPij), 
where n > 0. Suppose that x'' — max{a;^, x^, ■ • • , a;"}, if 
k < max{i,j}, then the conclusion is true. 

Then we prove the case that k > max{i,j} would not 
come forth. Suppose the edge Ey-^y^ is selected when 
node Xk is added. If fc > max{i,j}, neither node i nor 
node j is belong to the Ey-^y^ . Hence the path from i to j 
passing through a;'' must enter and leave Ey-^y^. Assume 
that the path enter Ey-^y^ by node yi and leave from node 
y2 , then there exists a path of SPij from yi to j/2 passing 
through x'^ , which is longer than the direct path yi — > 2/2- 
The youngest node must be either i or j when SPij is the 
shortest path. 

Denote d(i,j) as the distance between node i and 
node j. Let a{N) represent the total distance (j{N) = 
J2i<i<j<N '^(*' j)- The average distance of MRGN with 
order N, denoted by L{N), is defined as following 



LiN) 



2a{N) 
N{N -1)' 



(7) 



According to Lemma 3.1, the node newly added in the 
network will not affect the distance between old nodes. 
Hence we have 



N 



<j{N + 1) = a{N) + d{i, N+1) 



(8) 



Assume that the (A^-l-l)th node is add to the edge Ey^y^, 
then Equ.® can be written as 



N 



a{N + l) = a{N) + N + YD{i,y). (9) 



where D{i,y) = min{d(i, yi), d(i, i/2)}- Let a single node 
y represent the Ey-^y^ continuously, then we have the fol- 
lowing equation 

a{N + l) = cr{N) + N + Yd{i,y), (10) 

where the node set A = {1, 2, • • • , N} — {yi, 1/2} have (A^— 
2) members. The sum X]i=A^(*!2/) t)e considered as 
the distance from each node of the network to node y 
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in MRGN with order N — 1. Approximately, the sum 
Ei=A '^(*' y) equal to L{N — 1). Hence we have 



J2d{^,y)~{N-2)LiN-l) 



(11) 



Because the average distance L{N) increases 
monotonously with N, this yields 

(12) 

Then we can obtain the inequality 



a{N + 1) <aiN)+N + 



2a{N) 
N -1' 



(13) 



Enlarge (y{N), then the upper bound of the increasing 
tendency of a{N) will be obtained by the following equa- 
tion 



da{N) , 2aiN) 



(14) 



dN " ' N -1 
This leads to the following solution 

a{N) = log(iV-l)(iV-l)2 + Ci(7V-l)2-(iV-l). (15) 
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FIG. 2: The dependence between the average distance L and 
the order N of MRGN. One can see that L increases very 
slowly as A'^ increases. The inset exhibits the curve where 
L is considered as a function of InA'', which is fitted by a 
straight line. The curve is above the fitting line when A'^ is 
smaU(2000 < iV < 7000) and under the fitting line when 
N is large(A'^ > 8000), which indicates that the increasing 
tendency of L can be approximated as InA'^ and in fact a little 
slower than InA''. All the data are obtained by 10 independent 
simulations. 

By means of the theoretic approximate calculation, we 
prove that the increasing tendency of L{N) is a little 
slower than InA^. In Fig 3, we report the simulation re- 
sults on average distance of MRGN, which agree with the 
analytic result. 
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FIG. 3: The clustering coefficient of MRGN (red diamonds) 
and Holme-Kim networks (green squares). In this figure, one 
can find that the clustering coefficient of MRGN is almost a 
constant a little smaller than 0.75. The red line represents the 
analytic result 0.81. It is clear that, the clustering coefficient 
of Holme-Kim networks is much smaller than that of MRGN. 



The small- world effect consists of two properties: large 
clustering coefficient and small average distance. The 
clustering coefficient, denoted by C, is defined as C = 
X]t=i where Ci is the clustering coefficient for any 
arbitrary node i. Ci is 



2E{i) 
ki{ki — 1) 



(16) 



where E{i) is the number of edges in the neighbor set 
of the node i, and ki is the degree of node i. When the 
node i is added to the network, it is of degree 2 and 
E{i) = 1. If a new node is added to be a neighbor of i at 
some time step, E{i) will increase by one since the newly 
added node will link to one of the neighbors of node x. 
Therefore, in terms of ki the expression of E{i) can be 
written as following 



m - 1 

Hence, we have that 

a 



{k. 



2{h 



1. 



1 



2 

ki 



(17) 



(18) 



This expression indicates that the local clustering scales 
as Ci ^ fc^^. It is interesting that a similar scaling has 
been observed in pseudofractal web and several real- 
life networks Consequently, we have 



C 



N ^ ki 

i—1 



(19) 



Since the degree distribution is p{k) = cik ^, where k 



2,3, 



The average clustering coefficient C can 
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be rewritten as 

fe=2 k=2 

For sufficient large iV, fcmaa; ^ 2. The parameter ci 
satisfies the normalization equation 

fcmax 

= (21) 

It can be obtain that ci = 4.9491 and C ^ 4.9491 x 
Efe"2' = 0.8149. From Fig. 4, we can get that the 
analytical average clustering coefficient deviance the real 
value a little. Because the analytic one is obtained when 
the time step t oo and the simulation result is obtained 
when the time step t is finite. The other reason is that 
simulation result 7 of the degree distribution deviant 3 
a little, which is caused the finite network size. How- 
ever, the most important reason lies in the hypothesis 
Q that there are no correlations between all nodes. The 
demonstration exhibits that most real-life networks have 
large clustering coefficient no matter how many nodes 
they have. That is agree with the case of MRGN but 



conflict with the case of BA networks, thus MRGN may 
be more appropriate to mimic the reality. 

In summary, we have introduced a simple iterative al- 
gorithm for constructing MRGN. The networks have very 
large clustering coefficients and very small average dis- 
tance, which satisfy many real networks characteristics, 
such as the technological and social networks. After the 
newly added node connect to the first node i, it con- 
nect to the neighbor node of node i preferentially. They 
are not only the scale-free networks, but also small-world 
networks. The results imply the following conclusion: if 
there are no correlation between all node and the new 
node adds to the network in two step, whether the sec- 
ond step is random or preferential, the degree distribu- 
tion would be power-law and the exponent is 3. We have 
computed the analytical expressions for the degree dis- 
tribution and clustering coefficient. Since most real- life 
networks are both scale-free and small-world networks, 
MRGN may perform better in mimicking reality. Fur- 
ther work should focus on the information flow and the 
epidemic spread on MRGN. 
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